Skip to content

Use actual AI SDK token usage for compression and fix pricing lookup#2803

Open
tim-inkeep wants to merge 38 commits intomainfrom
implement/usage-tracker
Open

Use actual AI SDK token usage for compression and fix pricing lookup#2803
tim-inkeep wants to merge 38 commits intomainfrom
implement/usage-tracker

Conversation

@tim-inkeep
Copy link
Contributor

@tim-inkeep tim-inkeep commented Mar 23, 2026

Overview

This branch adds end-to-end LLM usage tracking and cost estimation to the Inkeep agent platform.

Previously, token counts from the AI SDK were discarded, so API responses returned hardcoded zeros. Now, every LLM generation across the system is instrumented with token counts, cost estimates, and rich OTEL attributes, all queryable through a new dashboard in the Manage UI.

All usage data lives in SigNoz spans, and the dashboard queries SigNoz directly via its ClickHouse-backed trace API.


1. Model Wrapper: usageCostMiddleware

File: packages/agents-core/src/utils/usage-cost-middleware.ts (new)

A Vercel AI SDK LanguageModelMiddleware that intercepts every LLM call (streaming and non-streaming) to calculate costs and set OTEL attributes.

How it works

  1. Wraps both doGenerate (non-streaming) and doStream (streaming) paths
  2. After the LLM responds, extracts token usage:
    • inputTokens
    • outputTokens
    • reasoningTokens
    • cachedReadTokens
    • cachedWriteTokens
  3. Parses the model string to determine provider and model name
  4. Calls PricingService.getModelPricing(modelName, provider) to look up per-token prices
  5. Calculates cost via PricingService.calculateCost(tokenUsage, pricing)
  6. Sets gen_ai.cost.estimated_usd on the active OTEL span, or sets gen_ai.cost.pricing_unavailable = true if pricing is not found

Applied in

ModelFactory.createModel() now automatically wraps every model created through the factory:

return wrapLanguageModel({
  model,
  middleware: usageCostMiddleware,
});

This means all LLM calls in the system get cost tracking automatically, regardless of call site.


2. Pricing Service

File: packages/agents-core/src/utils/pricing-service.ts (new)

A singleton service initialized on API startup (agents-api/src/index.ts) with two-tier pricing lookup and periodic refresh.

Pricing sources

Source Refresh Interval Env Requirement
AI Gateway (@ai-sdk/gateway.getAvailableModels()) 1 hour AI_GATEWAY_API_KEY
models.dev API (https://models.dev/api.json) 6 hours None (public)

Cost calculation support

The service calculates cost across 5 token types:

  • Input tokens × inputPerToken
  • Output tokens × outputPerToken
  • Cached read tokens × cachedReadPerToken
  • Cached write tokens × cachedWritePerToken
  • Reasoning tokens × reasoningPerToken
    • falls back to output pricing if no explicit reasoning price exists

3. OTEL Attributes Added

File: packages/agents-core/src/constants/otel-attributes.ts (extended)

New span attributes are now set across all generation spans:

Attribute Type Set By
gen_ai.cost.estimated_usd float usageCostMiddleware
gen_ai.cost.pricing_unavailable boolean usageCostMiddleware
gen_ai.usage.input_tokens int AI SDK telemetry
gen_ai.usage.output_tokens int AI SDK telemetry
gen_ai.usage.total_tokens int AI SDK telemetry
gen_ai.usage.reasoning_tokens int AI SDK telemetry
gen_ai.usage.cached_read_tokens int AI SDK telemetry
gen_ai.generation.type string experimental_telemetry.metadata.generationType
gen_ai.generation.step_count int generation span
gen_ai.generation.status string generation span
gen_ai.generation.duration_ms int generation span
gen_ai.generation.finish_reason string generation span
gen_ai.generation.streamed boolean generation span
gen_ai.generation.byok boolean generation span
gen_ai.requested_model string generation span
gen_ai.provider string generation span
gen_ai.response.model string generation span
gen_ai.message_id string generation span
context.breakdown.* int generate.ts (12 sub-attributes for token breakdown)

Generation type constants

export const GENERATION_TYPES = {
  SUB_AGENT_GENERATION: 'sub_agent_generation',
  CONVERSATION_COMPRESSION: 'conversation_compression',
  MID_GENERATION_COMPRESSION: 'mid_generation_compression',
  ARTIFACT_METADATA: 'artifact_metadata',
  STATUS_UPDATE: 'status_update',
  EVAL_SIMULATION: 'eval_simulation',
  EVAL_SCORING: 'eval_scoring',
  COMPONENT_RENDER: 'component_render',
};

4. All LLM Call Sites and Generation Types

Every LLM call site now passes generationType through experimental_telemetry.metadata, along with scoping IDs (tenantId, projectId, agentId, subAgentId, conversationId).

# File Purpose generationType Method
1 generate.ts Main agent streaming generation sub_agent_generation streamText()
2 generate.ts Main agent non-streaming generation sub_agent_generation generateText()
3 AgentSession.ts:1124 Structured status update status_update generateText() with Output.object()
4 AgentSession.ts:1687 Artifact name/description generation artifact_metadata generateText() with Output.object()
5 distill-utils.ts Shared distill function (used by 6–8) Passed through from caller generateText() with Output.object()
6 BaseCompressor.ts -> distillConversation() Mid-generation compression mid_generation_compression via distill-utils
7 ConversationCompressor.ts -> distillConversationHistory() Full conversation compression conversation_compression via distill-utils
8 distill-conversation-tool.ts Mid-gen distill tool call mid_generation_compression via distill-utils
9 distill-conversation-history-tool.ts History distill tool call conversation_compression via distill-utils
10 EvaluationService.ts Eval simulation agent eval_simulation via chat API
11 EvaluationService.ts Eval scoring eval_scoring via chat API
12 data-components/.../generate-render/route.ts UI component code generation component_render streamText()

Before this branch

Only call sites 1–2 had any telemetry (operation: 'generate'). All others had no generation type, no scoping IDs, and no cost tracking.

After this branch

All 12 call sites now emit:

  • generationType in telemetry metadata
  • Full scoping context (tenantId, projectId, agentId, subAgentId, conversationId)
  • Cost automatically calculated via the model wrapper middleware

5. Usage Dashboard

Files:

  • agents-manage-ui/src/components/cost/cost-dashboard.tsx (new, shared component)
  • agents-manage-ui/src/app/[tenantId]/cost/page.tsx (new, tenant-level page)
  • agents-manage-ui/src/app/[tenantId]/projects/[projectId]/cost/page.tsx (new, project-level page)
  • agents-manage-ui/src/lib/api/signoz-stats.ts (extended with usage query methods)

How it queries data

The dashboard queries SigNoz's trace API (ClickHouse-backed) directly — no intermediate database table.

It filters for spans where:

  • Operation is generateText or streamText (AI SDK operations)
  • ai.telemetry.generation_type is one of the 8 valid generation types
  • Optionally scoped to a specific project.id

Three parallel queries on page load

  1. By Model

    • sum(input_tokens)
    • sum(output_tokens)
    • sum(cost)
    • count()
    • grouped by gen_ai.model.id
  2. By Generation Type

    • same aggregates grouped by ai.telemetry.generation_type
  3. Events List

    • individual span data (up to 200) with model, tokens, cost, agent, status, and conversation link

Visualizations

  • 4 stat cards:
    • Estimated Cost (USD)
    • Total Tokens (input/output breakdown)
    • Generations count
    • Models Used count
  • Cost by Model table:
    • model
    • cost
    • total tokens
    • event count
  • Cost by Generation Type table:
    • type
    • cost
    • total tokens
    • event count
  • Cost Over Time chart:
    • daily cost aggregation as a line/bar chart
  • Events table:
    • timestamp
    • type
    • model
    • cost
    • input/output tokens
    • agent
    • sub-agent
    • status
    • conversation link

Filters and navigation

  • Time range filters: 24h, 7d, 15d, 30d, or custom date range
  • Added to the sidebar under a new Cost section at both tenant and project levels

6. Other Notable Changes

  • Token estimator moved from agents-api to agents-core (packages/agents-core/src/utils/token-estimator.ts) so it can be shared across packages
  • ai-sdk-callbacks.ts enhanced:
    • compression decisions now use actual token counts from previous generation steps (steps[N].usage.inputTokens) instead of always estimating
    • falls back to estimation when usage data is unavailable
  • Conversation traces page extended to show usage events per conversation
  • Runtime schema updated with GenerationType type export derived from GENERATION_TYPES

tim-inkeep and others added 20 commits March 19, 2026 14:16
Add append-only usage_events table for tracking LLM generation usage
across all call sites. Includes token counts (input, output, reasoning,
cached), dynamic pricing cost estimate, generation type classification,
and OTel correlation fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two-tier dynamic pricing: gateway getAvailableModels() as primary
(when AI_GATEWAY_API_KEY is set), models.dev API as universal fallback.
In-memory cache with periodic refresh (1h gateway, 6h models.dev).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Insert, query (paginated), and summary aggregation functions for
usage_events table. Supports groupBy model/agent/day/generation_type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
recordUsage() extracts tokens from AI SDK responses, looks up pricing,
sets OTel span attributes, and fire-and-forgets a usage_event insert.
New SPAN_KEYS: total_tokens, reasoning_tokens, cached_read_tokens,
response.model, cost.estimated_usd, generation.step_count, generation.type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add usage, totalUsage, and response fields to ResolvedGenerationResponse.
resolveGenerationResponse now resolves these Promise-based getters from
the AI SDK alongside steps/text/finishReason/output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Call recordUsage() after resolveGenerationResponse in runGenerate(),
capturing tenant/project/agent/subAgent context, model, streaming
status, and finish reason. Fire-and-forget, non-blocking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add recordUsage() calls for status_update and artifact_metadata
generation types in AgentSession. Compression call sites deferred
(need context threading through function signatures).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate estimateTokens() and AssembleResult into
packages/agents-core/src/utils/token-estimator.ts. Update all 10
import sites in agents-api to use @inkeep/agents-core. Removes
duplicate code and prepares for usage tracker integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace recordUsage() with trackedGenerate() — wraps generateText/
streamText calls to automatically record usage on success AND failure.
Failed calls check error type: 429/network = 0 tokens, other errors =
estimated input tokens from prompt. All call sites (generate.ts,
AgentSession status updates + artifact metadata, EvaluationService
simulation) now use the wrapper consistently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GET /manage/v1/usage/summary — aggregated usage by model/agent/day/
generation_type with optional projectId filter.
GET /manage/v1/usage/events — paginated individual usage events with
filters for project, agent, model, generation type.
Both enforce tenant auth with project-level access checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tenant-level usage dashboard at /{tenantId}/usage with:
- Summary stats: total tokens, estimated cost, generation count, models
- Token usage over time chart (daily buckets via AreaChartCard)
- Breakdown tables by model and generation type
- Project filter and date range picker
- Nav item added to sidebar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract UsageDashboard, UsageStatCards, UsageBreakdownTable into
reusable component. Both tenant-level (/{tenantId}/usage) and
project-level (/{tenantId}/projects/{projectId}/usage) pages import
the shared component. Register Usage tag in OpenAPI spec + docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Route handlers use c.get('tenantId') from middleware context
- Client fetches through /api/usage Next.js proxy (forwards cookies)
- Initialize PricingService at server startup for cost estimation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
resolvedModel from the AI SDK doesn't include provider prefix
(e.g. 'claude-sonnet-4-6' not 'anthropic/claude-sonnet-4-6').
Parse requestedModel once at the top and use the extracted modelName
for pricing lookup, falling back to resolvedModel when available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cking

Data layer:
- Add steps JSONB column for per-step token breakdown
- Populate traceId/spanId from active OTel span
- Add conversation/message groupBy + conversationId filter
- Thread agentId/conversationId through compression call chain
- Wrap compression generateText calls with trackedGenerate

Traces integration:
- Conversation detail route fetches usage events and merges cost
  into activities by spanId (with parentSpanId fallback)
- Cost shows on timeline items and span detail panels
- Usage Cost card on conversation detail page

UI:
- Events table with pagination, trace links, agent/sub-agent columns
- 50/50 chart + events layout
- conversationId filter in usage API client

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Write path:
- Remove Postgres insert from persistEvent, keep OTel span attributes
- Add all schema fields as span attributes (requested_model, provider,
  status, streamed, byok, finish_reason, duration_ms, error_code, message_id)
- Add UsageCostSpanProcessor that enriches doGenerate/doStream spans
  with per-step cost from PricingService before export
- Standardize experimental_telemetry.metadata.generationType across all
  call sites (status_update, artifact_metadata, compression)
- Extract USAGE_GENERATION_TYPES constant for shared filter

Read path:
- Add getUsageCostSummary (SigNoz aggregation by model/agent/type/conversation)
- Add getUsageEventsList (SigNoz LIST query for individual spans)
- Dashboard fetches from SigNoz via existing signoz-stats client
- Events table and chart derived from SigNoz span data
- Filter to only spans with known generationType to reduce noise

trackedGenerate no longer takes db parameter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Mar 24, 2026 8:12pm
agents-manage-ui Ready Ready Preview, Comment Mar 24, 2026 8:12pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
agents-docs Skipped Skipped Mar 24, 2026 8:12pm

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Mar 23, 2026

🦋 Changeset detected

Latest commit: 2eb1617

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-core Patch
@inkeep/agents-api Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch
@inkeep/agents-email Patch
@inkeep/agents-mcp Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@tim-inkeep
Copy link
Contributor Author

@pullfrog, please review my PR

Copy link
Contributor

@pullfrog pullfrog bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid PR — the shift from heuristic token estimation to actual AI SDK usage data is well-motivated and the implementation is clean. The wrapLanguageModel approach to fix pricing lookups is elegant. Two medium-severity issues (potential PricingService interval leak, v3 middleware spec version risk) and a few minor items below.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow runpullfrog.com𝕏

if (this.initialized) return;
this.initialized = true;

await Promise.allSettled([this.refreshGateway(), this.refreshModelsDev()]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: initialize() is not idempotent against concurrent callers. The initialized guard prevents re-entry but uses a synchronous boolean — if two callers race past the guard before the first sets this.initialized = true, both will set up duplicate intervals. Consider storing the init promise:

private initPromise: Promise<void> | null = null;

async initialize(): Promise<void> {
  if (!this.initPromise) {
    this.initPromise = this.doInitialize();
  }
  return this.initPromise;
}

Alternatively, guard the interval creation behind this.gatewayInterval == null to be safe.

if (this.modelsDevInterval) clearInterval(this.modelsDevInterval);
this.gatewayInterval = null;
this.modelsDevInterval = null;
this.initialized = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: destroy() does not clear initPromise / caches. If someone calls destroy() then initialize() again, this.initialized is false but the caches still contain stale data from the previous lifecycle. Not blocking — the singletons are long-lived in practice — but worth noting for test hygiene.

}

export const usageCostMiddleware: LanguageModelMiddleware = {
specificationVersion: 'v3',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: specificationVersion: 'v3' ties this to an unreleased/experimental middleware API version. If the AI SDK ships a breaking change to the v3 spec (usage shape, callback signatures), this will silently break cost tracking. Confirm this version is stable in the ai package version pinned in your lockfile. If not, add a comment noting the version dependency.

const result = await doGenerate();

try {
const inputTokens = result.usage.inputTokens.total ?? 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: result.usage.inputTokens.total assumes a nested .total property. This matches the v3 spec's structured usage shape, but the old v1/v2 shape used flat inputTokens: number. If any codepath bypasses wrapLanguageModel and hits this middleware with the old shape, it will throw. The try/catch on line 77 guards against this, so it's safe — just noting the implicit contract.

`To access other models, use OpenRouter (openrouter/model-id), Vercel AI Gateway (gateway/model-id), NVIDIA NIM (nim/model-id), or Custom OpenAI-compatible (custom/model-id).`
);
}
return wrapLanguageModel({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modelId: modelString here passes the full provider/model-name string (e.g. anthropic/claude-sonnet-4). This is what calculateAndSetCost receives as modelId, and then it splits on / to extract the model name when providerId is present (line 29 of usage-cost-middleware.ts). This works correctly — just confirming the data flow is intentional since the middleware does its own parsing.

if (hasReliableUsage) {
// Use actual token counts from the last completed step
// Next step's context ≈ last step's input + last step's output (assistant response appended)
totalTokens = actualInputTokens + (actualOutputTokens ?? 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness check: totalTokens = actualInputTokens + (actualOutputTokens ?? 0) approximates the next step's context size as "last input + last output". This is a good heuristic but slightly oversimplifies — the output gets appended as a new assistant message, so the actual input for the next step includes the original context plus the output tokens, which is what inputTokens already captures for the current step. So the formula effectively double-counts the prior context. In practice this is conservative (triggers compression earlier), which is arguably safer. Worth documenting the rationale.

safetyBuffer,
triggerAt,
remaining: hardLimit - totalTokens,
source: steps.length > 0 ? 'actual_sdk_usage' : 'estimated',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: source: steps.length > 0 ? 'actual_sdk_usage' : 'estimated' — at this point in the code, we're inside the compressionNeeded branch. The source was already determined above, but this ternary re-derives it from steps.length which doesn't account for the hasReliableUsage check (e.g. steps.length > 0 but inputTokens was 0 → fell back to estimate). Consider using a local source variable set at the decision point.

// USAGE GENERATION TYPES (table removed — usage now tracked via OTel/SigNoz)
// ============================================================================

import { USAGE_GENERATION_TYPES } from '../../constants/otel-attributes';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing from ../../constants/otel-attributes inside a schema file is a bit unusual — it creates a dependency from the DB schema layer to the telemetry constants layer. Since this is just a type re-export and the comment says "table removed — usage now tracked via OTel/SigNoz", it makes sense, but consider whether USAGE_GENERATION_TYPES + GenerationType belong in otel-attributes.ts or in a shared usage-types.ts to keep the schema file focused on DB concerns.

}),
};

const result = await generateText(genConfig as Parameters<typeof generateText>[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The as Parameters<typeof generateText>[0] cast here and in several other places (AgentSession.ts, EvaluationService.ts) suggests the config object doesn't naturally satisfy the generateText parameter type. This is a known pattern when building configs incrementally, but the number of casts in this PR is growing. Not blocking — just flagging for awareness.

Comment on lines +60 to +67
const MODEL_ALIASES: Record<string, string[]> = {
'claude-sonnet-4': ['claude-sonnet-4'],
'claude-opus-4': ['claude-opus-4'],
'claude-haiku-3.5': ['claude-3-5-haiku', 'claude-3.5-haiku'],
'claude-sonnet-3.5': ['claude-3-5-sonnet', 'claude-3.5-sonnet'],
'claude-opus-3': ['claude-3-opus'],
'claude-haiku-3': ['claude-3-haiku'],
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alias map is Anthropic-only right now. OpenAI, Google, and other providers have similar aliasing needs (e.g. gpt-4o vs gpt-4o-2024-08-06). This is fine as a starting point — the stripDateSuffix regex handles the most common case — but the map will need expansion as users hit pricing misses for other providers.

@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 23, 2026

TL;DR — Replaces the inaccurate character-based token estimation heuristic with actual AI SDK token usage for mid-generation compression decisions, adds a PricingService that resolves model costs from AI Gateway and models.dev, wraps every language model with cost-tracking middleware that writes gen_ai.cost.estimated_usd to OTEL spans, and introduces a Usage Dashboard in the manage UI for visualizing costs and token consumption.

Key changes

  • Use actual AI SDK StepResult.usage for compression — Mid-generation compression now reads inputTokens + outputTokens from the last completed step instead of the text.length / 4 heuristic, falling back to estimates only on step 0 or when usage data is unavailable.
  • Add PricingService with dual-source pricing lookup — New service fetches per-model pricing from AI Gateway (hourly) and models.dev (6-hourly), with alias resolution, date-suffix stripping, and deduplicated miss logging.
  • Wrap all models with usageCostMiddleware — Every non-mock LanguageModel from ModelFactory is now wrapped to intercept generate/stream results, look up pricing, and record estimated cost on the active OTEL span.
  • Enrich OTEL span attributes for usage telemetry — 30+ new span keys (gen_ai.cost.estimated_usd, gen_ai.generation.type, gen_ai.provider, etc.) and a USAGE_GENERATION_TYPES enum classify every generation call.
  • Consolidate estimateTokens into @inkeep/agents-core — The local token-estimator in agents-api is deleted; the canonical version in agents-core is marked @deprecated with documented accepted usages.
  • Add Usage Dashboard UI — New /usage pages at tenant and project levels display cost summaries, model breakdowns, generation-type breakdowns, cost-over-time charts, and detailed event tables sourced from SigNoz.
  • Show cost and token data in conversation trace timeline — Timeline items now display token counts and estimated cost from OTEL spans.

Summary | 46 files | 22 commits | base: mainimplement/usage-tracker


Actual token usage for mid-generation compression

Before: Compression decisions used calculateContextSize() — a text.length / 4 heuristic — which could trigger compression too early or too late.
After: The prepareStep callback now passes the AI SDK steps array; handlePrepareStepCompression reads steps[last].usage.inputTokens and outputTokens for the real count, falling back to estimates with a warning log only when actual data is unavailable (step 0 or undefined usage).

The new isCompressionNeededFromActualUsage(totalTokens) method on MidGenerationCompressor compares actual tokens against the configured threshold directly, bypassing estimation entirely.

When does the fallback kick in? On the very first step (step 0) there is no prior usage data, and some providers may return undefined or 0 for inputTokens. In both cases the system logs a warning and uses the character-based estimate as a safety net.

ai-sdk-callbacks.ts · generate.ts · MidGenerationCompressor.ts


Pricing service and cost middleware

Before: No pricing data or cost tracking existed anywhere in the system.
After: PricingService fetches model pricing from AI Gateway and models.dev on startup, usageCostMiddleware wraps every language model to compute per-call cost, and the result is recorded as gen_ai.cost.estimated_usd on the active OTEL span.

PricingService handles model name normalization — stripping date suffixes (e.g. claude-sonnet-4-20250514claude-sonnet-4), resolving static aliases (e.g. claude-sonnet-4claude-3-5-sonnet), and scoping lookups by provider. Cost calculation accounts for five token types: input, output, reasoning, cached-read, and cached-write.

How are pricing misses handled? When a model has no pricing entry, the miss is logged once per refresh cycle (deduplicated via a Set of model IDs) so operators are alerted without log spam.

pricing-service.ts · usage-cost-middleware.ts · model-factory.ts · index.ts


Enriched OTEL span attributes

Before: Generation spans carried minimal metadata (subAgentId, phase).
After: Spans now include gen_ai.cost.estimated_usd, gen_ai.generation.type, gen_ai.provider, gen_ai.generation.status, gen_ai.generation.duration_ms, context.breakdown.actual_input_tokens, and tenant/project/agent/conversation/session IDs.

The USAGE_GENERATION_TYPES enum classifies every generation call (sub_agent_generation, mid_generation_compression, conversation_compression, status_update, artifact_metadata, eval_simulation, eval_scoring), enabling downstream aggregation by type.

otel-attributes.ts · AgentSession.ts · distill-utils.ts


Usage Dashboard UI

Before: No cost or usage visibility in the management interface.
After: New /usage pages at tenant and project levels display stat cards (total tokens, estimated cost, generation count, models used), breakdown tables by model and generation type, a cost-over-time area chart, and a detailed events table linking to conversation traces.

Data is fetched from SigNoz via two new query methods — getUsageCostSummary() for aggregated data and getUsageEventsList() for individual span-level events. A "Cost" nav item is added to both the tenant and project sidebars.

usage-dashboard.tsx · signoz-stats.ts · usage/page.tsx (tenant) · usage/page.tsx (project)

Pullfrog  | View workflow run | Triggered by Pullfrogpullfrog.com𝕏

@pullfrog
Copy link
Contributor

pullfrog bot commented Mar 23, 2026

TL;DR — Replaces the text.length / 4 token estimation heuristic with actual AI SDK StepResult.usage token counts for mid-generation compression decisions, adds a new PricingService that resolves model costs from AI Gateway and models.dev, and wires automatic cost annotation into every model call via usageCostMiddleware. Includes a new usage dashboard UI backed by SigNoz/OTel span queries.

Key changes

  • Use actual SDK token counts for compressionhandlePrepareStepCompression now reads inputTokens + outputTokens from the last completed AI SDK step instead of estimating from serialized message length, falling back to the old heuristic only on step 0.
  • New PricingService with dual-source lookup — Fetches model pricing from AI Gateway and models.dev with periodic refresh, date-suffix stripping, and a Claude alias map for model name normalization.
  • usageCostMiddleware for automatic cost annotation — A LanguageModelMiddleware that intercepts completions, calculates cost via PricingService, and writes gen_ai.cost.estimated_usd to the active OTel span.
  • ModelFactory wraps all models with cost middlewarewrapLanguageModel is applied to every non-mock model with correct modelId and providerId propagation, fixing the pricing lookup context.
  • Expanded OTel attributes for usage tracking — 20+ new SPAN_KEYS covering generation type, tenant/project/agent/conversation IDs, cost, reasoning tokens, cached tokens, and finish reason.
  • Generation telemetry enrichment across all call sitesgenerate.ts, AgentSession, artifact metadata, and distill utilities now emit generationType, tenantId, projectId, agentId, and conversationId in telemetry metadata.
  • estimateTokens moved to agents-core and deprecated — Token estimator relocated from agents-api to packages/agents-core/src/utils/token-estimator.ts and marked @deprecated.
  • Usage dashboard UI — New UsageDashboard component with stat cards, breakdown tables by model and generation type, cost-over-time chart, and events list.
  • Usage pages at org and project level — New routes at /{tenantId}/usage and /{tenantId}/projects/{projectId}/usage with sidebar "Cost" nav items.
  • SigNoz API methods for usage queriesgetUsageCostSummary and getUsageEventsList aggregate and list usage data from OTel spans.
  • Per-span cost in traces timelineActivityItem gains costUsd and timeline items display estimated cost inline for AI generation spans.

Summary | 46 files | 21 commits | base: mainimplement/usage-tracker


Actual SDK token counts for compression decisions

Before: isCompressionNeeded serialized messages to JSON and used estimateTokens(text) (~4 chars per token heuristic) to decide when to compress.
After: isCompressionNeededFromActualUsage uses inputTokens + outputTokens from the AI SDK's StepResult.usage, falling back to the estimate only on step 0 or when the provider returns 0.

The prepareStep callback signature now receives { messages, steps } instead of just { messages }. handlePrepareStepCompression extracts the last step's usage data and calls the new isCompressionNeededFromActualUsage method on MidGenerationCompressor, which compares against hardLimit - safetyBuffer. The old isCompressionNeeded(messages[]) method is marked @deprecated.

When does the fallback heuristic still fire?

On step 0 (no prior steps exist) and when a provider returns undefined or 0 for token counts — ensuring compression still triggers for providers that don't report usage.

ai-sdk-callbacks.ts · MidGenerationCompressor.ts · BaseCompressor.ts


Pricing service and cost middleware

Before: No pricing data existed — token costs were not tracked anywhere.
After: PricingService fetches from AI Gateway and models.dev on startup, resolves costs per input/output/cache token, and usageCostMiddleware writes gen_ai.cost.estimated_usd to every generation's OTel span.

PricingService normalizes model names by stripping date suffixes (e.g. claude-sonnet-4-20250514claude-sonnet-4) and applying a static alias map for Claude model families. Pricing misses are logged once per refresh cycle to avoid noise. ModelFactory.createLanguageModel wraps every non-mock model with wrapLanguageModel({ middleware: usageCostMiddleware, modelId, providerId }), ensuring the middleware always has provider context for lookups.

How does the dual-source pricing lookup work?

The service first checks AI Gateway pricing, then falls back to models.dev. It refreshes the AI Gateway source every hour and models.dev every 6 hours. A combined lookup map is built at each refresh, and model names are normalized through date-suffix stripping and alias resolution before lookup.

pricing-service.ts · usage-cost-middleware.ts · model-factory.ts


OTel instrumentation and generation telemetry

Before: Generation metadata carried only an operation field with no tenant, project, or agent context.
After: Every generation emits generationType, tenantId, projectId, agentId, and conversationId — plus 20+ new span attributes for cost, reasoning tokens, cached tokens, and generation status.

All generation call sites — generate.ts, AgentSession status updates, artifact metadata, and distill-utils — now populate telemetry metadata with full context. SPAN_KEYS in otel-attributes.ts gains constants for GEN_AI_COST_ESTIMATED_USD, GENERATION_TYPE, GENERATION_STATUS, GENERATION_DURATION_MS, GENERATION_IS_BYOK, GENERATION_IS_STREAMED, and token-level breakdowns.

otel-attributes.ts · generate.ts · AgentSession.ts


Usage dashboard UI and SigNoz API integration

Before: No visibility into token usage or costs in the Manage UI.
After: Org-level and project-level usage pages show stat cards (total tokens, estimated cost, generation count, models used), breakdown tables, a cost-over-time chart, and an events list — all queried from SigNoz spans.

UsageDashboard renders at /{tenantId}/usage (with a project filter dropdown) and /{tenantId}/projects/{projectId}/usage (pre-filtered). The sidebar gains a "Cost" nav item at both levels. SigNozStatsAPI adds getUsageCostSummary (aggregated by model/agent/type/conversation) and getUsageEventsList (individual span events). The traces timeline also gains inline cost display for AI generation spans.

usage-dashboard.tsx · signoz-stats.ts · usage/page.tsx

Pullfrog  | View workflow run | Triggered by Pullfrogpullfrog.com𝕏

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting stale pending review to create a fresh one.

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(2) Total Issues | Risk: Low

This is a delta review covering commits since the last automated approval (2140a3678). The core feature implementation has been extensively reviewed (9 prior automated reviews) with all Critical and Major issues addressed.

🟡 Minor (1) 🟡

🟡 1) cost/page.tsx:31 Project filter state not persisted in URL

Issue: The tenant-level cost page's project filter uses local React state (useState) rather than URL query params. The filter is lost on page refresh/navigation.

Why: Flagged by Ito tests: "Tenant cost project filter state is not persisted across refresh/history". Users sharing URLs or refreshing will see the filter reset to "All projects".

Fix: Use useQueryState from nuqs to persist the project filter in the URL.

Refs: See inline comment on agents-manage-ui/src/app/[tenantId]/cost/page.tsx:31

Inline Comments:

  • 🟡 Minor: cost/page.tsx:31 Project filter not persisted in URL

💭 Consider (1) 💭

💭 1) otel-attributes.ts:86-105 Unused SPAN_KEYS constants

Issue: 14 SPAN_KEYS constants in the GEN_AI_* group are defined but never referenced in the codebase. Only 4 are actually used (GEN_AI_USAGE_INPUT_TOKENS, GEN_AI_USAGE_OUTPUT_TOKENS, GEN_AI_COST_ESTIMATED_USD, GEN_AI_COST_PRICING_UNAVAILABLE).

Why: As @shagun-singh-inkeep noted, this adds ~60 lines of dead code. Consider removing unused constants or adding comments explaining future intent.

Refs: See inline comment on packages/agents-core/src/constants/otel-attributes.ts:105

✅ Human Reviewer Feedback Addressed

Feedback Status
@shagun-singh-inkeep: Use SPAN_KEYS.GEN_AI_COST_PRICING_UNAVAILABLE instead of hardcoded string ✅ Fixed in usage-cost-middleware.ts:84
@shagun-singh-inkeep: Remove dead groupBy === 'day' code ✅ Fixed in signoz-stats.ts - 'day' option removed from type union
@shagun-singh-inkeep: Unused SPAN_KEYS are dead code ⚠️ Noted in Consider section

✅ Prior Critical/Major Issues Resolved

All Critical and Major issues from prior review cycles have been addressed:

  • ✅ External HTTP timeout added to models.dev fetch
  • extractUsageTokens() helper handles both nested and flat usage shapes
  • initPromise pattern ensures idempotent initialization
  • ✅ Serverless optimization with on-access refresh (no setInterval)
  • USAGE_GENERATION_TYPES constant used consistently
  • ✅ Comprehensive test coverage for pricing-service and usage-cost-middleware

🧹 While You're Here (Ito Test Observations)

The Ito test suite found 2 pre-existing issues (not introduced by this PR):

  1. Generate-render APIs return 500 for 403/404 errors — Both artifact and data component render routes catch upstream ApiError but always return 500, obscuring access-denied semantics.

  2. Conversation trace API has unsanitized query construction — The conversationId is interpolated directly into SigNoz filter expressions without validation.

These are worth tracking but out of scope for this PR.


✅ APPROVE

Summary: This PR is in excellent shape after 9 review iterations. All Critical and Major issues have been addressed. The core implementation — using actual AI SDK token counts for compression, PricingService with dual-source lookup, cost middleware, and cost dashboard — is well-tested and production-ready. Two minor items remain (project filter URL persistence and unused SPAN_KEYS), both non-blocking. The serverless optimization concern from @robert-inkeep has been properly addressed with the lazy/stale-while-revalidate pattern. Ready to ship! 🚀

Discarded (0)

No findings discarded — all prior issues verified as resolved.

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator 4 1 1 0 2 0 0
Total 4 1 1 0 2 0 0

Note: Delta review focused on human reviewer feedback and Ito test failures. All prior automated review findings verified as resolved.

GEN_AI_GENERATION_BYOK: 'gen_ai.generation.byok',
GEN_AI_GENERATION_STREAMED: 'gen_ai.generation.streamed',
GEN_AI_MESSAGE_ID: 'gen_ai.message_id',

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 Consider: Unused SPAN_KEYS constants

Issue: As @shagun-singh-inkeep noted, many of the newly added GEN_AI_* SPAN_KEYS are defined but not referenced anywhere in the codebase:

  • GEN_AI_USAGE_REASONING_TOKENS
  • GEN_AI_USAGE_CACHED_READ_TOKENS
  • GEN_AI_RESPONSE_MODEL
  • GEN_AI_GENERATION_STEP_COUNT
  • GEN_AI_GENERATION_TYPE
  • GEN_AI_REQUESTED_MODEL
  • GEN_AI_PROVIDER
  • GEN_AI_GENERATION_STATUS
  • GEN_AI_GENERATION_DURATION_MS
  • GEN_AI_GENERATION_FINISH_REASON
  • GEN_AI_GENERATION_ERROR_CODE
  • GEN_AI_GENERATION_BYOK
  • GEN_AI_GENERATION_STREAMED
  • GEN_AI_MESSAGE_ID

Why: These constants were added for future OTel instrumentation but are currently unused. They add ~60 lines of dead code that could confuse future contributors.

Fix: Either:

  1. Remove the unused constants now and add them when instrumentation is implemented
  2. Keep them with a // Future: will be set by X instrumentation comment explaining the intent

The only constants actually being used from this group are:

  • GEN_AI_USAGE_INPUT_TOKENS
  • GEN_AI_USAGE_OUTPUT_TOKENS
  • GEN_AI_COST_ESTIMATED_USD
  • GEN_AI_COST_PRICING_UNAVAILABLE

Refs:

} = useTracesQueryState();

const [projects, setProjects] = useState<Project[]>([]);
const [selectedProjectId, setSelectedProjectId] = useState<string | undefined>(undefined);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Project filter state not persisted in URL

Issue: The selectedProjectId state is stored only in React state (useState) and not in URL query params. This means the project filter is lost on page refresh or browser back/forward navigation.

Why: Users who select a project filter and then refresh the page (or share the URL) will see the filter reset to "All projects". The Ito test suite flagged this as a regression: "Tenant cost project filter state is not persisted across refresh/history".

Fix: Add projectId to the useTracesQueryState() hook or use a separate useQueryState for the project filter:

// Option 1: Add to useTracesQueryState hook
const { projectId, setProjectId } = useTracesQueryState();

// Option 2: Use nuqs directly
import { parseAsString, useQueryState } from 'nuqs';
const [selectedProjectId, setSelectedProjectId] = useQueryState('projectId', parseAsString);

Refs:

@github-actions github-actions bot deleted a comment from claude bot Mar 24, 2026
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a delta review covering commits since the last automated review (review ID 4001789103, 2026-03-24T19:31:17Z).

✅ All Prior Issues Resolved

All issues from the 10 prior automated review cycles have been addressed:

Category Issue Status
🔴 Critical External HTTP call without timeout ✅ Fixed: AbortSignal.timeout(10_000) added
🟠 Major Nested property access may throw TypeError ✅ Fixed: extractUsageTokens() handles both nested and flat usage shapes
🟠 Major Silent catch swallows model parsing errors ✅ Fixed: Proper error handling with fallback to unknown provider
🟠 Major Hardcoded generation types missing component_render ✅ Fixed: Now uses USAGE_GENERATION_TYPES constant
🟡 Minor Project filter state not persisted in URL ✅ Fixed: Now uses useQueryState('projectId', parseAsString)

✅ Human Reviewer Feedback Addressed

Reviewer Feedback Status
@shagun-singh-inkeep Remove dead groupBy === 'day' code ✅ Removed from signoz-stats.ts
@shagun-singh-inkeep Use SPAN_KEYS.GEN_AI_COST_PRICING_UNAVAILABLE instead of hardcoded string ✅ Fixed in usage-cost-middleware.ts:84
@shagun-singh-inkeep Unused SPAN_KEYS are dead code ✅ Noted — only 4 GEN_AI_* constants remain, all are now used

Implementation Quality

The PR implements end-to-end LLM cost tracking with solid patterns:

  • PricingService: Dual-source lookup (AI Gateway + models.dev) with on-access refresh (serverless-optimized, no setInterval)
  • usageCostMiddleware: AI SDK v3 middleware that extracts both nested and flat usage shapes
  • Cost Dashboard: URL-persisted filters, time range presets, project filtering
  • Test Coverage: Comprehensive tests for pricing-service.ts and usage-cost-middleware.ts

✅ APPROVE

Summary: After 10 automated review cycles and addressing all human reviewer feedback, this PR is production-ready. The core implementation — using actual AI SDK token counts for compression, PricingService with dual-source lookup, cost middleware with proper error handling, and cost dashboard with URL state persistence — is well-tested and follows best practices. Ship it! 🚀

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator (delta) 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0

Note: Delta review verified all prior findings addressed. No new issues found.

@github-actions github-actions bot deleted a comment from claude bot Mar 24, 2026
Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a delta review covering 5 commits since the last automated review (2140a3678).

✅ All Prior Issues Resolved

All Critical, Major, and Minor issues from the 11 prior automated review cycles have been addressed:

Issue Status
🔴 External HTTP call without timeout ✅ Fixed — AbortSignal.timeout(10_000) added to models.dev fetch
🟠 Nested property access may throw TypeError ✅ Fixed — extractUsageTokens() helper handles both nested and flat usage shapes
🟠 initialize() not idempotent ✅ Fixed — initPromise pattern ensures single initialization
🟠 Serverless optimization ✅ Fixed — On-access refresh strategy, no setInterval timers
🟡 Project filter state not persisted in URL ✅ Fixed — Now uses useQueryState('projectId', parseAsString)
🟡 Unused SPAN_KEYS constants ✅ Fixed — Trimmed to only 4 used GEN_AI_* constants
🟡 Hardcoded gen_ai.cost.pricing_unavailable string ✅ Fixed — Uses SPAN_KEYS.GEN_AI_COST_PRICING_UNAVAILABLE

Delta Changes Reviewed

Commit Summary Assessment
026772a45 Removed inkeep-agents-manage-ui from service name filter ✅ Intentional — component render events won't be tracked in cost dashboard
57519243a Trimmed unused SPAN_KEYS, updated projectId persistence ✅ Addresses @shagun-singh-inkeep's feedback
f29e54b72 Major signoz-stats refactoring ✅ Cleaner query construction
7308c05cb, a2994de20 Merge commits from main ✅ Unrelated to usage-tracker feature

Implementation Quality ✅

The PR implements end-to-end LLM cost tracking with solid patterns:

  • PricingService: Dual-source lookup (AI Gateway + models.dev) with lazy initialization and on-access refresh (serverless-optimized)
  • usageCostMiddleware: AI SDK v3 middleware with extractUsageTokens() handling both nested and flat usage shapes
  • Cost Dashboard: URL-persisted filters (useQueryState), time range presets, project filtering
  • Test Coverage: Comprehensive tests for pricing-service.ts (285 lines) and usage-cost-middleware.ts (296 lines)
  • OTEL Attributes: Clean constant organization with only used keys exported

Test Coverage Verified

Both new utility files have comprehensive test coverage:

File Tests Coverage
pricing-service.test.ts 15 tests calculateCost, getModelPricing, initialize idempotency, date suffix stripping, error handling, destroy lifecycle
usage-cost-middleware.test.ts 14 tests v3 structured usage, flat usage, pricing_unavailable attribute, no-span scenarios, model ID parsing

✅ APPROVE

Summary: After 11 review iterations and comprehensive human reviewer feedback from @shagun-singh-inkeep, this PR is production-ready. All Critical, Major, and Minor issues have been properly addressed. The implementation follows best practices for:

  • Serverless optimization — Lazy init with stale-while-revalidate refresh pattern
  • Error handling — Graceful degradation when pricing unavailable
  • Type safety — Handles both AI SDK v3 structured and flat usage shapes
  • Observability — Proper OTEL span attributes for cost tracking
  • UI/UX — URL-persisted filters for shareable dashboard state

Ship it! 🚀

Note: Unable to submit formal GitHub approval due to permission constraints — this is a COMMENT review with approval recommendation.

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator (delta) 5 0 0 0 0 0 0
Total 5 0 0 0 0 0 0

Note: Delta review verified all prior findings addressed. No new issues found.

@itoqa
Copy link

itoqa bot commented Mar 24, 2026

Ito Test Report ✅

15 test cases ran. 15 passed.

The unified QA run passed all 15/15 test cases with zero failures, confirming stable behavior across cost, AI-calls, conversation traces, and component render-generation flows in local non-production testing. Key findings were that access controls and deep-link login gating worked correctly, filtering/time-range interactions (including empty and future ranges plus rapid toggling) remained coherent without crashes, usage/cost totals and timeline estimated costs were accurate, mobile cost pages and trace navigation were usable, and security checks (cross-project tampering, malformed IDs, and query-parameter XSS payloads) produced safe denied/inert outcomes with no data leakage or backend internals exposure.

✅ Passed (15)
Category Summary Screenshot
Adversarial Unauthenticated direct access to /default/cost redirected to login with returnUrl; protected cost content stayed hidden. ADV-1
Adversarial Tampered projectId query and unauthorized conversation deep-link stayed empty-safe/denied with no foreign trace exposure. ADV-2
Adversarial Script payloads in query params remained inert and window.__xss stayed undefined across cost and AI-calls routes. ADV-3
Adversarial Malformed conversation ID route shows safe error behavior without exposing backend internals. ADV-4
Edge A far-past custom range rendered correct empty states with zeroed totals and no stale or malformed rows. EDGE-1
Edge Future custom end dates are tolerated via end-time clamping to now-1ms. EDGE-2
Logic Tenant and project AI Calls pages use the intended usage-scoped query paths. LOGIC-2
Mobile Mobile 390x844 cost pages remained usable and View trace navigation to conversation details worked as expected. MOBILE-1
Rapid Repeated rapid project-filter and preset toggling settled to a coherent final state without runtime error overlays. RAPID-1
Happy-path Sidebar Cost navigation opened /default/cost and rendered the Cost & Token Usage page shell. ROUTE-1
Happy-path Project filter apply/remove behavior worked: selecting a project set projectId in the URL and removing the filter cleared it back to all projects. ROUTE-3
Happy-path Tenant and project cost pages handled preset transitions (30d/24h/7d) with consistent query-state updates and stable widgets. ROUTE-4
Happy-path AI Usage & Cost summary totals align with the same usage event rows used for per-event rendering. ROUTE-6
Happy-path Timeline row and expanded detail panel both surface estimated cost for eligible AI activity. ROUTE-7
Happy-path Data and artifact component render generation completed in new and modify modes with streaming endpoint responses observed. ROUTE-8

Commit: 2eb1617

View Full Run


Tell us how we did: Give Ito Feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants